Introduction to ggplot2

The most popular data visualization framework in R

PHAC R Usergroup: Basics of R Summer Camp

2025-07-23

What is ggplot2?

ggplot2 is the most downloaded R library in the world

ggplot2 overview

  • Part of the tidyverse suite of packages, ggplot2 offers great consistency with dplyr pipes and syntax, with one very notable exception: the use of + instead of %>% to connect functions/verbs

  • “gg” refers to “The Grammar of Graphics,” a 1999 book from Leland Wilkinson that laid out a framework for creating statistical graphics in layered components

  • The “2” in ggplot2 alludes to the fact that this package supplants an older, less efficient library called ggplot, but since the developer wanted to support legacy usage of the older library, both packages were maintained for a period of time. Nevertheless, ggplot() remains as the main function in ggplot2

Anatomy of a plot

https://ggplot2.tidyverse.org/articles/ggplot2.html

https://ggplot2.tidyverse.org/articles/ggplot2.html

Getting started - Data

tail(ggplot_downloads_df)
           date count package
3829 2025-06-25 54021 ggplot2
3830 2025-06-26 50323 ggplot2
3831 2025-06-27 50306 ggplot2
3832 2025-06-28 31488 ggplot2
3833 2025-06-29 33328 ggplot2
3834 2025-06-30 48941 ggplot2

Getting started - Plot

library(ggplot2)

# ggplot2 needs the user to provide at least the following three components: 
# data, mapping, and layer

g2 <- ggplot(ggplot_downloads_df,          # Data
             aes(x = date, y = count)) +   # Mapping
             geom_point()                  # Layer

g2

Select plot types

https://ouyanglab.com/covid19dataviz/ggplot2.html

https://ouyanglab.com/covid19dataviz/ggplot2.html

How to make a good plot?

  • Easy to start with some minimally viable code for a plot, but how does one make a cleaner and more polished graph?

    • Ironically, you need more lines of code to have a less cluttered plot
  • By utilizing rest of the components, one can add more features to the graph and also clean up unnecessary elements

  • Implement some good data visualization practices. Author Stephanie Evergreen has a 24-point checklist on her website, and a previous presentation in the user group shows how to incorporate that in ggplot2

Polishing the plot - Additional Geom

library(scales)

g3 <- ggplot(ggplot_downloads_df,          # Data
             aes(x = date, y = count)) +   # Mapping
             geom_point() +                # Layer
             geom_smooth()                 # Layer (additional)
  
g3

Polishing the plot - Scaling layer

library(scales)

g4 <- ggplot(ggplot_downloads_df,          # Data
             aes(x = date, y = count)) +   # Mapping
             geom_point() +                # Layer
             geom_smooth() +               # Layer (additional)
             scale_y_continuous(
               labels = label_number(
                 suffix = "K", scale = 1e-3)
               )                           # Scale
  
g4

Polishing the plot - Theme layer

library(scales)

g5 <- ggplot(ggplot_downloads_df,          # Data
             aes(x = date, y = count)) +   # Mapping
             geom_point() +                # Layer
             geom_smooth() +               # Layer (additional)
             scale_y_continuous(
               labels = label_number(
                 suffix = "K", scale = 1e-3)
               ) +                         # Scale
             theme_minimal()               # Theme
g5

Polishing the plot - Specifying Labels

library(scales)

g6 <- ggplot(ggplot_downloads_df,          # Data
             aes(x = date, y = count)) +   # Mapping
             geom_point() +                # Layer
             geom_smooth() +               # Layer (additional)
             scale_y_continuous(
               labels = label_number(
                 suffix = "K", scale = 1e-3)
               ) +                         # Scale
             theme_minimal() +             # Theme
             labs(title = "ggplot2's rise in popularity over the past decade",
                  subtitle = "Number of daily downloads from CRAN",
                         x = NULL,
                         y = NULL)         # Labels
g6

Polishing the plot - Modify Colour

library(scales)

g7 <- ggplot(ggplot_downloads_df,          # Data
             aes(x = date, y = count)) +   # Mapping
             geom_point(colour = "lightgrey") + # Layer
             geom_smooth() +               # Layer (additional)
             scale_y_continuous(
               labels = label_number(
                 suffix = "K", scale = 1e-3)
               ) +                         # Scale
             theme_minimal() +             # Theme
             labs(title = "ggplot2's rise in popularity over the past decade",
                  subtitle = "Number of daily downloads from CRAN",
                         x = NULL,
                         y = NULL)         # Labels
g7

Where can ggplot2 visuals be used?

  • RStudio

  • HTML documents

  • Word/Powerpoint files

  • Markdown documents

  • Books

  • Websites

  • Interactive apps

  • Copy + Paste them in emails, reports, briefing notes, etc.

Suggested avenues to explore

  • Experiment with different types of graphs
  • Change default parameters
  • Plot maps
  • Play around with various ggplot2 extensions
  • Learn how to incorporate ggplot2 graphs in R Markdown/Quarto reports
  • Implement reactive ggplot2 objects within Shiny apps
  • Explore interactive plots via ggiraph or ggplotly packages
  • Compare and contrast ggplot2 with base R or different libraries
  • Match Government of Canada look and feel
  • Make visuals colour-blind friendly and generally compliant with accessibility standards

Resources